计算优化问题解决方案解决方案的雅各布是机器学习中的一个核心问题,其应用程序在超参数优化,元学习,优化为层和数据集蒸馏中的应用程序,仅举几例。展开的分化是一种流行的启发式方法,它使用迭代求解器近似溶液,并通过计算路径区分它。这项工作提供了对梯度下降和Chebyshev方法的二次目标的这种方法的非反应收敛速率分析。我们表明,为了确保雅各布的融合,我们可以1)选择较大的学习率,导致快速渐近地收敛,但接受该算法可能具有任意长的燃烧阶段或2)选择较小的学习率直接但较慢的收敛性。我们将这种现象称为展开的诅咒。最后,我们讨论了相对于这种方法的开放问题,例如为最佳展开策略得出实用的更新规则,并与Sobolev正交多项式领域建立了新的联系。
translated by 谷歌翻译
我们以非渐近方式考虑最大似然估计(MLE)的预期对数估计(MLE)的预期似然估计(MLE)的最佳次数(MAL)的缀合物最大(MAP)的问题。令人惊讶的是,我们在文献中没有找到对这个问题的一般解决方案。特别是,当前的理论不适用于高斯或有趣的少数样本制度。在表现出问题的各个方面之后,我们显示我们可以将地图解释为在日志可能性上运行随机镜像下降(SMD)。然而,现代收敛结果不适用于指数家庭的标准例子,突出趋同文献中的孔。我们认为解决这一非常根本的问题可能会对统计和优化社区带来进展。
translated by 谷歌翻译
本专着涵盖了常用于凸优化的一系列加速技术的最新进展。我们首先使用二次优化问题来引入两个关键的方法,即势头和嵌套优化方案。它们在二次案例中一致形成Chebyshev方法。我们详细讨论了势头方法,从Nesterov的开场工作和使用少数主模板开始,例如用于优化梯度方法,这提供了展示动量如何优化收敛保证的关键效益。我们使用类似的算法图案进一步覆盖催化剂的核心和加速混合近端框架的近端加速度。常见的加速技术直接依赖于手头问题中的一些规律性参数的知识。我们通过讨论重启方案的结论,一组简单的技术,用于达到几乎最佳的收敛速率,同时适应未观察到的规则性参数。
translated by 谷歌翻译
我们开发了一个框架,用于随机二次问题的平均分析和衍生算法在此分析下最佳。这产生了一类实现加速的新方法,给出了Hessian的特征值分布的模型。我们为统一,Marchenko-Pastur和指数分布开发显式算法。这些方法是基于势头的算法,其超参数可以估计,而无需了解Hessian的最小奇异值,相反,与Nesterov加速和Polyak动量等经典加速方法相比。通过对二次和逻辑回归问题的经验基准,我们确定了所提出的方法改善古典(最坏情况)加速方法的制度。
translated by 谷歌翻译
Classical reinforcement learning (RL) techniques are generally concerned with the design of decision-making policies driven by the maximisation of the expected outcome. Nevertheless, this approach does not take into consideration the potential risk associated with the actions taken, which may be critical in certain applications. To address that issue, the present research work introduces a novel methodology based on distributional RL to derive sequential decision-making policies that are sensitive to the risk, the latter being modelled by the tail of the return probability distribution. The core idea is to replace the $Q$ function generally standing at the core of learning schemes in RL by another function taking into account both the expected return and the risk. Named the risk-based utility function $U$, it can be extracted from the random return distribution $Z$ naturally learnt by any distributional RL algorithm. This enables to span the complete potential trade-off between risk minimisation and expected return maximisation, in contrast to fully risk-averse methodologies. Fundamentally, this research yields a truly practical and accessible solution for learning risk-sensitive policies with minimal modification to the distributional RL algorithm, and with an emphasis on the interpretability of the resulting decision-making process.
translated by 谷歌翻译
Deep learning models are being increasingly applied to imbalanced data in high stakes fields such as medicine, autonomous driving, and intelligence analysis. Imbalanced data compounds the black-box nature of deep networks because the relationships between classes may be highly skewed and unclear. This can reduce trust by model users and hamper the progress of developers of imbalanced learning algorithms. Existing methods that investigate imbalanced data complexity are geared toward binary classification, shallow learning models and low dimensional data. In addition, current eXplainable Artificial Intelligence (XAI) techniques mainly focus on converting opaque deep learning models into simpler models (e.g., decision trees) or mapping predictions for specific instances to inputs, instead of examining global data properties and complexities. Therefore, there is a need for a framework that is tailored to modern deep networks, that incorporates large, high dimensional, multi-class datasets, and uncovers data complexities commonly found in imbalanced data (e.g., class overlap, sub-concepts, and outlier instances). We propose a set of techniques that can be used by both deep learning model users to identify, visualize and understand class prototypes, sub-concepts and outlier instances; and by imbalanced learning algorithm developers to detect features and class exemplars that are key to model performance. Our framework also identifies instances that reside on the border of class decision boundaries, which can carry highly discriminative information. Unlike many existing XAI techniques which map model decisions to gray-scale pixel locations, we use saliency through back-propagation to identify and aggregate image color bands across entire classes. Our framework is publicly available at \url{https://github.com/dd1github/XAI_for_Imbalanced_Learning}
translated by 谷歌翻译
A wide variety of model explanation approaches have been proposed in recent years, all guided by very different rationales and heuristics. In this paper, we take a new route and cast interpretability as a statistical inference problem. We propose a general deep probabilistic model designed to produce interpretable predictions. The model parameters can be learned via maximum likelihood, and the method can be adapted to any predictor network architecture and any type of prediction problem. Our method is a case of amortized interpretability models, where a neural network is used as a selector to allow for fast interpretation at inference time. Several popular interpretability methods are shown to be particular cases of regularised maximum likelihood for our general model. We propose new datasets with ground truth selection which allow for the evaluation of the features importance map. Using these datasets, we show experimentally that using multiple imputation provides more reasonable interpretations.
translated by 谷歌翻译
In this paper, we identify the best learning scenario to train a team of agents to compete against multiple possible strategies of opposing teams. We evaluate cooperative value-based methods in a mixed cooperative-competitive environment. We restrict ourselves to the case of a symmetric, partially observable, two-team Markov game. We selected three training methods based on the centralised training and decentralised execution (CTDE) paradigm: QMIX, MAVEN and QVMix. For each method, we considered three learning scenarios differentiated by the variety of team policies encountered during training. For our experiments, we modified the StarCraft Multi-Agent Challenge environment to create competitive environments where both teams could learn and compete simultaneously. Our results suggest that training against multiple evolving strategies achieves the best results when, for scoring their performances, teams are faced with several strategies.
translated by 谷歌翻译
Words of estimative probability (WEP) are expressions of a statement's plausibility (probably, maybe, likely, doubt, likely, unlikely, impossible...). Multiple surveys demonstrate the agreement of human evaluators when assigning numerical probability levels to WEP. For example, highly likely corresponds to a median chance of 0.90+-0.08 in Fagen-Ulmschneider (2015)'s survey. In this work, we measure the ability of neural language processing models to capture the consensual probability level associated to each WEP. Firstly, we use the UNLI dataset (Chen et al., 2020) which associates premises and hypotheses with their perceived joint probability p, to construct prompts, e.g. "[PREMISE]. [WEP], [HYPOTHESIS]." and assess whether language models can predict whether the WEP consensual probability level is close to p. Secondly, we construct a dataset of WEP-based probabilistic reasoning, to test whether language models can reason with WEP compositions. When prompted "[EVENTA] is likely. [EVENTB] is impossible.", a causal language model should not express that [EVENTA&B] is likely. We show that both tasks are unsolved by off-the-shelf English language models, but that fine-tuning leads to transferable improvement.
translated by 谷歌翻译
Neural networks trained with ERM (empirical risk minimization) sometimes learn unintended decision rules, in particular when their training data is biased, i.e., when training labels are strongly correlated with undesirable features. To prevent a network from learning such features, recent methods augment training data such that examples displaying spurious correlations (i.e., bias-aligned examples) become a minority, whereas the other, bias-conflicting examples become prevalent. However, these approaches are sometimes difficult to train and scale to real-world data because they rely on generative models or disentangled representations. We propose an alternative based on mixup, a popular augmentation that creates convex combinations of training examples. Our method, coined SelecMix, applies mixup to contradicting pairs of examples, defined as showing either (i) the same label but dissimilar biased features, or (ii) different labels but similar biased features. Identifying such pairs requires comparing examples with respect to unknown biased features. For this, we utilize an auxiliary contrastive model with the popular heuristic that biased features are learned preferentially during training. Experiments on standard benchmarks demonstrate the effectiveness of the method, in particular when label noise complicates the identification of bias-conflicting examples.
translated by 谷歌翻译